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The inclusion of fermionic loops contribution in Numerical Stochastic Perturbation Theory (NSPT) has a nice 
feature: it does not cost so much (provided only that an FFT can be implemented in a fairly efficient way). 
Focusing on Lattice SU(3), we report on the performance of the current implementation of the algorithm and the 
status of first computations undertaken. 



1. Introduction 

At Lattice 2000 we discussed how to in- 
clude fermionic loops contributions in Numeri- 
cal Stochastic Perturbation Theory for Lattice 
SU(3), an algorithm which we will refer to as 
UNSPT (Unquenched NSPT). Our main message 
here is that unquenching NSPT results in not 
such a heavy computational overhead, provided 
only that an FFT can be implemented in a fairly 
efficient way. FFT is the main ingredient in con- 
structing the fcrmion propagator by inverting the 
Dirac kernel order by order. For a discussion of 
the foundations of UNSPT we refer the reader to 



2. Lattice SU(3) UNSPT on APEmille 

The need for an efficient FFT is what forced 
us to wait for APEmille: our FFT implementa- 
tion mimic [^| , which is based on a 1 — dim FFT 
plus transpositions, an operation which asks for 
local addressing on a parallel architecture. UN- 
SPT has been implemented both in single and 
in double precision, the former being remarkably 
robust for applications like Wilson loops. To esti- 
mate the computational overhead of unquenching 
NSPT one can inspect Table [l| We report exe- 
cution times of a fixed amount of sweeps both 
for quenched and unquenched NSPT. On both 
columns the growth of computational time is con- 
sistent with the the fact that every operation 
is performed order by order. On each row the 



growth due to unquenching is roughly consistent 
with a factor 5/3. One then wants to understand 
the dependence on the volume, which is the crit- 
ical one, the propagator being the inverse of a 
matrix: this is exactly the growth which has to 
be tamed by the FFT. One should compare exe- 
cution times at a given order on L = 8 and L = 16 
lattice sizes. Note that L = 8 is simulated on an 
APEmille board (8 FPUs), while L = 16 on an 
APEmille unit (32 FPUs). By taking this into 
account one easily understands that FFT is do- 
ing its job: the simulation time goes as the vol- 
ume also for UNSPT (a result which is trivial for 
quenched NSPT). Notice that at this level one has 
only compared crude execution times: a careful 
inspection of autocorrelations is anyway not going 
to jeopardize the picture. As for the dependence 
on Nf (number of flavours), it is a parametric one: 
one plugs in various numbers and then proceed to 
fit the polynomial (in Nf) which is fixed by the 
order of the computation. It is then reassuring to 
find the quick response to a change in Nf which 
one can inspect in Figure [l] (which is the signal 
for second order of the plaquette at a given value 
of the hopping parameter K). 



3. Benchmark computation I: Wilson 
loops 

We now proceed to discuss some benchmark 
computations. A typical one is given by Wilson 
loops. In Figure || one can inspect the first five 
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Table 1 



Execution times of a fixed number of sweeps for quenched and unquenched NSPT (see main text for 
details). 
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N, = 3 



Figure 1. The effect of changing the number of 
flavours (on the fly). 



orders Q of the basic plaquette at a given value of 
hopping parameter K, for which analytic results 
can be found in ||: going even higher in order 
would be trivial at this stagef]. Apart for being 
an easy benchmark, we are interested in Wilson 
loops for two reasons. First of all we are com- 
pleting the unquenched computation of the Lat- 
tice Heavy Quark Effective Theory Residual Mass 
(see Q for the quenched result). On top of that 
we also keep an eye on the issue of whether one 
can explain in term of renormalons the growth of 



1 A11 our expansions are written in powers of j3~ 1 . 
2 Notice anyway that these computations are performed at 
a given value of hopping parameter K, but with no mass 
counterterm (see later). 



the coefficients of the plaquette. There is a debate 
going on about that (see ||), the other group in- 
volved having also started to make use of NSPT. 
In the renormalon framework the effect of Nf can 
be easily inferred from the /3-function, eventually 
resulting in turning the series to oscillating signs. 

4. Benchmark computation II: the Wilson 
fermions Critical Mass 

In Figure || we show the signal for one loop or- 
der of the Critical Mass for Wilson fermions (two 
loop results are available from ||). The compu- 
tation is performed in the way which is the most 
standard in Perturbation Theory, i.e. by inspect- 
ing the pole in the propagator at zero momentum. 
This is already a tough computation. It is a zero 
mode, an IR mass-cutoff is needed and the vol- 
ume extrapolation is not trivial. On top of that 
one should keep in mind that also gauge fixing 
is requested. The coefficients which are known 
analytically can be reproduced. Still one would 
like to change strategy in order to go to higher 
orders (which is a prerequisite of all other high 
order computations). The reason is clear: we 
have actually been measuring the propagator S, 
while the physical information is actually coded 
in T2 = S^ 1 (one needs to invert the series and 
huge cancellations are on their way). Notice any- 
way that the fact that the Critical Mass is already 
known to two-loop makes many interesting com- 
putations already feasible. 

5. Conclusions 

Benchmark computations in UNSPT look 
promising, since the computational overhead of 
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Figure 2. First five orders of the basic plaquettc 
for K = 0.16 and N f = 2 on a L = 8 lattice. 
On x-axes different values of the time step r used 
in integrating Langevin equation: the result has 
to be extrapolated to r ~ 0. Analytic results for 
second and third order (first one is independent of 
sea fermions) in infinite volume are marked with a 
different symbol and they are hardly distinguish- 
able. 



including fermionic loops contributions is not so 
huge. This is to be contrasted with the heavy 
computational effort requested for non perturba- 
tive unquenched lattice QCD. This in turn sug- 
gests the strategy of going back to perturba- 
tion theory for the (unquenched) computation 
of quantities like improvement coefficients and 
renormalisation constants. The Critical Mass be- 
ing already known to two loops, many of these 
computations are already feasible at a 2 order. 
We have only discussed the implementation of the 
algorithm on the APEmillc architecture. We can 
also rely on a C ++ implementation for PC's (clus- 
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Figure 3. The signal for the first order of the 
Critical Mass on a L — 8 lattice with a cutoff 



mass of 



0.01, at a given value of the time 



step used in the integration of Langevin equation. 
Analytic result on the same volume and with the 
same cutoff is 2.557 . . . 



ters) which is now at the final stage of develop- 
ment. 
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